Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models

نویسندگان

  • Konstantinos Koukos
  • David Black-Schaffer
  • Vasileios Spiliopoulos
  • Stefanos Kaxiras
چکیده

This work demonstrates the potential of hardware and software optimization to improve the effectiveness of dynamic voltage and frequency scaling (DVFS). For software, we decouple data prefetch (access) and computation (execute) to enable optimal DVFS selection for each phase. For hardware, we use measurements from state-of-the-art multicore processors to accurately model the potential of per-core, zero-latency DVFS. We demonstrate that the combination of decoupled access-execute and precise DVFS has the potential to decrease EDP by 25-30% without reducing performance. The underlying insight in this work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency. For the memorybound access phase, where we prefetch data into the cache from main memory, we can run at a reduced frequency and voltage without hurting performance. Thereafter, the execute phase can run much faster, thanks to the prefetching of the access phase, and achieve higher performance. This decoupled program behavior allows us to achieve more effective use of DVFS than standard coupled executions which mix data access and compute. To understand the potential of this approach, we measure application performance and power consumption on a modern multicore system across a range of frequencies and voltages. From this data we build a model that allows us to analyze the effects of per-core, zero-latency DVFS. The results of this work demonstrate the significant potential for finer-grain DVFS in combination with DVFS-optimized software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACACES A Decoupled Access/Execute Architecture for Mobile GPUs

Smartphones are emerging as one of the fastest growing markets, providing enhanced capabilities every few months. However, supporting these hardware/software improvements comes at the cost of reducing the operating time per battery charge. The GPU is only left with a shrinking fraction of the power budget, but the trend towards better screens will inevitably lead to a higher demand for improved...

متن کامل

Decoupled Access-Execute on ARM big.LITTLE

Energy-efficiency plays a significant role given the battery lifetime constraints in embedded systems and hand-held devices. In this work we target the ARM big.LITTLE, a heterogeneous platform that is dominant in the mobile and embedded market, which allows code to run transparently on different microarchitectures with individual energy and performance characteristics. It allows to use more ene...

متن کامل

Profiling-Assisted Decoupled Access-Execute

As energy efficiency became a critical factor in the embedded systems domain, dynamic voltage and frequency scaling (DVFS) techniques have emerged as means to control the system’s power and energy efficiency. Additionally, due to the compact design, thermal issues become prominent. State of the art work promotes software decoupled accessexecution (DAE) that statically generates code amenable to...

متن کامل

Backhaul-Aware Decoupled Uplink and Downlink User Association, Subcarrier Allocation, and Power Control in FiWi HetNets

Decoupling the uplink and downlink user association improves the throughput of heterogeneous networks (HetNets) and balances the traffic load of macro- and small- base stations. Recently, fiber-wireless HetNets (FiWi-HetNets) have been considered as viable solutions for access networks. To improve the accuracy of user association and resource allocation algorithms in FiWi-HetNets, the capacity ...

متن کامل

Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

On multi-core architectures with software-managed memories, effectively orchestrating data movement is essential to performance, but is tedious and error-prone. In this paper we show that when the programmer can explicitly specify both the memory access pattern and the execution schedule of a computation kernel, the compiler or run-time system can derive efficient data movement, even if analysi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013